Skip to content

TOC-Context Free Language

Context Free Language

A context-free grammar (CFG) is a 4-tuple (N,T,R,s), where

  • N is a finite set called variables;
  • T is a finite set called terminals;
  • R is a finite set of production rules with each in the form
At1t2tn

where A is a variable and t1t2tn is a string of variables and terminals; and

  • sN is the start variable.

+ Definition: CFL

A language is context-free if exactly all strings in the language can be produced by a context-free grammar.

+ Theorem

A language is context-free if and only if it is recognized by a PDA.

Ambiguous

A parse tree shows the structure of a string (sentence). However, CFG is strongly enough to create ambiguous.

+ Definition: Ambiguous

A grammar is ambiguous if a string in the language defined by the grammar can be presented by multiple parse trees with different structures.

Also, checking whether a CFG is ambiguous is undecidable. Meaning that there is NO algorithm / machine which can tell an arbitrary CFG is ambiguous or not.

Properties

Equivalence

+ Theorem

A language is context-free if and only if it is recognized by a PDA.

Let L be an arbitrary context-free language. By the definition of CFL, L has a CFG G generating it. Then, we prove the theorem by introducing a conversion algorithm from G to a PDA P which recognizes L and vice versa.

CFG to PDA

To construct the PDA P from G, let w be a string in L.

  • G generates w by a sequence of productions.
  • We construct P to simulate this procedure.
  • P generates some strings and pushes them into the stack.
  • Then, P tries to match the stack with w.

In detail, P does the following things.

  1. Push $ and the start variable E into the stack ($ first and E second).
  2. Repeat the followings.
    • If a variable A is on top of the stack, nondeterministically select the rules starts from A and replace A by the RHS of the rule (in the stack).
    • If a terminal a is on top of the stack, check the next input symbol. If yes, continue; otherwise reject.
    • If $ is on the top, accept the string if all symbols of w have been read; reject otherwise.

Formally, let G=(N,T,R,s) be a CFG. Construct the following transitions.

  1. δ(qstart,ϵ,ϵ)=(qE,$)
  2. δ(qE,ϵ,ϵ)=(qloop,E)
  3. δ(qloop,ϵ,$)=(qaccept,ϵ)
  4. For each production rule r in the form of At1t2tn, where n2; construct the followings:
      1. δ(qloop,ϵ,A)=(r1,tn);
      1. δ(ri,ϵ,ϵ)=(ri+1,tni) for 1in2
      1. δ(rn1,ϵ,ϵ)=(qloop,t1)
  5. For each production rule At,δ(qloop,a,a)=(qloop,t).
  6. For each terminal aT,δ(qloop,a,a)=(qloop,ϵ).

The PDA is called parser, which is used in compilers.

PDA to CFG

Now let's convert an arbitrary PDA P to a CFG G. First, we modify P.

  • P has a single final state qf. If it has multiple accepting states, create some artificial ϵ transitions to one of the accepting states.
  • The stack of P is always empty when a string is accepted. If the stack is non-empty, create an artificial state to pop out everything in the state.
  • Every transition either pushes a symbol or pops out a symbol (not both, not none). If a transition is δ(qi,a,x)=(qi,y), create an artificial state qk and replace the transition by δ(qi,a,x)=(qk,ϵ) and (qk,ϵ,ϵ)=(qj,y)

Formally, Let P=(Q,Σ,Γ,δ,q0,qf). The construction of G is as follows:

  • The set of nonterminals is N={Ai,jqi,qjQ}.
  • The set of terminals is T=Σ.
  • The initial symbol is s=A0,f.
  • The production rules are of three types:
    • xΓ, if there exist transitions δ(qi,a,ε)=(qj,x) and δ(qk,b,x)=(ql,ε), then add Ai,laAj,kb to R.
    • qi,qj,qkQ, add Ai,jAi,kAk,j to R.
    • qiQ, add Ai,iε.

Limitation

Closure Properties

Context-free and Regular